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-T-  ^  ABSTRACT 

/  s'  / 

We  'investigates^the  effect  of  intracluster  correlation  in  two-stage 
samples  on  the  ordinary  F  procedures  in  linear  models.  A  measure  is 
proposed  as  a  diagnostic  and  basis  for  correction  to  the  F  statistic.  A 
decomposition  of  this  measure  is  given  in  terms  of  the  contributions  of  the 
individual  regressors  and  their  cross  products.  For  known  intracluster 
correlation  the  proposed  correction  to  F  performs  very  well  in  the  numerical 
study.  For  unknown  intracluster  correlation  a  simple  alternative  to  the 
generalized  least  squares  procedure  is  proposed  and  is  shown  to  perform 
favorably  in  the  simulation  study.  - 
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SIGNIFICANCE  AND  EXPLANATION 


Users  of  survey  data  often  ignore  the  effect  of  survey  design  on 
analysis.  This  may  be  due  to  the  unavailability  of  information  on  design  such 
as  cluster  labels.  Another  reason  is  that  standard  packages  not  take  into 
account  the  design  effect.  We  study  the  effect  of  intracluster  correlation  in 
two-stage  sampling  on  the  validity  of  statistical  procedures  based  on  the  F 
statistic.  A  measure  is  proposed  as  a  diagnostic  and  basis  for  correction  to 
the  F  statistic.  The  proposed  correction  and  related  modifications  perform 
well  in  the  simulation  study.  We  also  explain  the  design  effect  in  terms  of 
the  individual  variables  and  their  cross  products. 
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ON  THE  EFFECT  OF  TWO-STAGE  SAMPLING  ON  THE  F  STATISTIC 


C.  F.  J.  Wu,  D.  Holt  and  D.  Holmes* 


1 •  Introduction 

The  assumption  of  independent  and  identically  distributed  observations  which 
underlies  many  statistical  procedures  is  called  into  question  when  analyzing  complex 
survey  data.  The  population  structure,  and  particularly  the  existence  of  clusters  in 
two-stage  samples  which  usually  exhibit  positive  intracluster  correlation, 
invalidates  the  independence  assumption.  The  impact  of  this  in  regression  analysis 
has  been  investigated  in  the  standard  s^unple  survey  theory  framework  by  Kish  and 
Frankel  (1974)  and  in  the  linear  model  framework  by  Campbell  (1979)  and  Scott  and 
Holt  (1982).  The  overall  picture  is  that  while  ordinary  least  squares  (OLS) 
procedures  are  unbiased,  but  not  fully  efficient,  for  estimation  of  the  regression 
coefficients,  serious  difficulties  can  arise  in  using  the  OLS  estimators  for  second 
order  terms.  Variances  of  the  OLS  estimators  for  the  regression  coefficients  can  be 
larger,  sometimes  much  larger,  than  the  usual  OLS  variance  expression  would  indicate 
and  estimators  for  the  variances  of  coefficient  estimators  do  not  take  this  into 
account.  This  leads  to  underestimation  of  variances  with  consequences  for  confidence 
intervals. 

This  paper  is  concerned  with  following  this  impact  through  to  the  F  statistic 
because  of  its  central  importance  to  hypothesis  tests  and  confidence  ellipsoids.  Our 
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first  aim  is  to  investigate  the  effect  of  intracluster  correlation  on  the  F 
statistic.  We  then  seek  modifications  to  the  F  statistic  which  will  restore  its 
usual  properties  without  needing  the  full  set  of  information  and  numerical  complexity 
of  the  alternative  generalized  least  squares  (GLS)  procedvire.  We  also  seek 
diagnostic  statistics  which  will  identify  vdien  the  ordinary  F  statistic  is  likely 
to  be  affected  and  explore  the  various  factors  which  contribute  to  < his  effect. 
Finally  we  compare  our  alternative  procedures  with  GLS. 

Sections  2  and  3  contain  the  basic  framework  and  theoretics  'velopment  and  lay 
the  groundwork  for  modifications  to  the  F  statistic.  Section  4  isiders  examples 

of  one  and  two  covariates  as  special  cases  of  the  general  theory.  'ction  5  presents 

numerical  results  for  the  case  of  two  independent  variables,  which  ib  the  simplest 
allowing  many  of  the  factors  to  be  explored.  A  further  modification  to  the  F 
statistic  is  proposed  in  Section  5  and  comparisons  made  with  the  iterative  GLS 
procedure  when  the  intracluster  correlation  coefficient  is  unknown.  The  proposed 
modifications  perform  much  better  than  the  OLS  procedures.  They  perform  almost  as 
well  as  the  GLS  procedures  for  large  values  of  the  intracluster  correlation  and 
better  than  the  GLS  for  small  values.  A  summary  of  the  numerical  and  simulation 
results  and  relevant  remarks  are  given  in  Section  6. 


2.  F  Statistic  Under  a  Regression  Model  for  Two-Stage  Samples 

Following  Campbell  (1974)  and  Scott  and  Holt  (1982),  we  utilize  a  regression 
model  with  an  error  structure  which  allows  for  intracluster  correlation  of  the 
residual  errors: 

y  *  XB  +  e  ,  (2.1) 

where  there  are  n  observations  from  a  two-stage  sample  with  c  clusters  drawn  at 

the  first  stage  of  sampling  and  m^  elements  drawn  from  the  sampled  cluster  at 

c 

the  second  stage,  n  =  m- .  Assume  t  is  normal  with  mean  zero  and  variance- 

2 

covariance  matrix  V.  The  sample  observations  are  written  in  the  natural  order 


with  the  first  elements  from  the  first  cluster  and  so  on  and  V  is  assumed  to 

c 

have  a  block  diagonal  form  *  V.  with 

1 


P  1 


(2.2) 


If  no  account  is  taken  of  the  variance  structure,  the  standard  OLS  procedures 

are 

0  =  (x'^X)“^x'^y 

A  yv 

var(0)  =  a^(x’’x)"’ 

where  =  y'^d  -  X(x'^X)“^X^)y/(n  -  k)  and  there  are  k  explanatory  variables. 
The  F  statistic  for  hypothesis  testing  and  confidence  ellipsoids  is 


F(6) 


iixB  »  xeii^/k 

«y  -  xesVcn  -  k) 


(2.3) 


If  the  cluster  labels  are  known,  the  natural  approach  is  to  use  GLS  or  iterative  GLS 
(when  P  Is  unknown  and  must  be  estimated.)  We  return  to  this  alternative  in 
Section  5.  However,  the  standard  OLS  procedures  and  the  F  statistic  are  often  used 
either  because  the  cluster  labels  are  unavailable  (particularly  when  the  survey  data 
is  used  for  secondary  analysis)  or  because  users  of  the  survey  data  ignore  the 
effects  of  P  on  their  analysis.  Thus,  to  test  B  »  B^,  the  hypothesis  is  rejected 
at  the  a  significance  level  if 

F(6o)  >  F^(k,n  -  k)  ;  (2.4) 

and  the  associated  ( 1  -  p)  confidence  ellipsoid  is 

{B  :  F(6)  F^j^(k,n  -  k)}  ,  (2.5) 

where  F^(k,n  -  k)  is  the  upper  ot  point  of  the  F  distribution  with  k  and 


n  -  k  degrees  of  freedom. 


Under  the  model  (2.1)-(2.2)  the  F  statistic  does  not  in  general  have  an  F 

distribution  and  the  F  procedure  is  invalid.  The  F  test  will  not  have  true 

significance  level  equal  to  the  nominal  cx  value  and  the  coverage  property  of  the 

- 1  /2 

confidence  ellipsoid  will  be  similarly  distorted.  By  writing  6  =  v  £/  F  can  be 
written  as 


F 


fT  1/2  1/2 

6  V  PV 


6  a _ 

6/(n  -  k) 


(2.6) 


where  6  =  (6^,...,  6^)  are  independent  N(0,1)  and  P  =  X(x'^X)“^x'^  is  the 
projection  matrix  onto  the  column  space  of  X.  Apart  from  the  two  scalar  factors 
k  and  n  -  k,  the  numerator  and  denominator  in  (2.6)  are  each  separately  weighted 
sums  of  independent  chi-square  random  variables  with  the  added  complication  that  they 
are  correlated.  Thus,  6*^7 6  is  distributed  as  1  1.5.,  where  the  (5 . } 

2  i=1  ^  ^ 

are  independent  and  the  eigenvalues  of  PV.  Similarly  for  the 

denominator  the  weights  are  the  eigenvalues  of  (I  -  P)V. 

The  actual  coverage  probability  of  the  ellipsoid  (2.5)  can  be  shown  to  be 
Prob{6'^V^'^^  [P  -  k(n  -  k)  V^(k,n  -  k)(I  -  P)]V^'^^6  <  0}  (see  (A. 5)),  which  is  not 
tractable . 

We  note  that  survey  data  is  usually  large  and  the  denominator  of  F,  a  ,  has 
2 

mean  o  [n  -  tr(PV)]/(n  -  k),  where  tr  is  the  trace  of  a  matrix  (see  Scott  and 

-2 

Holt,  1982).  Since  n  is  large  and  tr(PV)  is  of  the  order  of  k,  o  is  nearly 
2 

unbiased  for  ®  •  This  suggests  that  the  correlation  between  the  numerator  and 
denominator  of  F  may  be  weak  and  its  effect  on  the  validity  of  F  is  small.  This 
is  borne  out  by  the  numerical  results  in  Section  5. 

A  simple  and  revealing  way  of  studying  the  effect  of  intracluster  correlation 

2 

on  F  is  to  approximate  its  numerator  and  denominator  by  constant  multiples  of  Xj^ 

2 

and  X  .  By  matching  the  first  moments  the  constants  are  tr(PV)/k  and 


[n  -  tr(PV)]/(n  -  k)  respectively.  Because  of  the  almost  unbiasedness  of  the 


denominator  for  large  n,  we  focus  on  tr(PV)/k.  If  It  is  substantially  different 
from  one,  the  distribution  of  the  F  statistic  Is  not  adequately  approximated  by 
the  F  distribution.  For  example,  if  n  is  large  and 

tr(PV)/k  <  Fjj^(k,n  -  k)/F(j^(k,n  -  k),  02  >  the  ordinary  F  test  with  nominal 

level  has  actual  level  at  least  0.2'  therefore  use  tr(PV)/k  as  a 

measure  of  the  effect  of  the  intracluster  correlation  on  the  ordinary  F  procedure 
A  better  approximation  to  the  true  distribution  of  F  can  be  obtained  by 

''2  ^2  2  2 

approximating  HxS  -  xBH  and  Uy  -  xB#  in  F  by  c  X  and  c  X  ,  where  c^ 

^1'  ^2'  ^2  determined  by  matching  the  first  two  moments  (Satterthwaite, 

2 

1946).  However,  c^  and  depend  on  tr(PV)  (similarly  C2  and  depend 

5 

on  tr(V  -  PV)  ),  which  is  not  as  readily  available  as  tr(PV).  Moreover,  the 
additional  gain  in  accuracy  by  this  more  refined  approximation  is  small.  Therefore 
it  is  not  further  pursued. 

If  P  =  0  then  V  =  I  and  tr(PV)/k  =1.  In  general  the  true  covariance 

2  -1 

matrix  of  the  OLS  estimators  for  B  is  given  by  c  =  o  (X'X)  D,  where 
D  »  (X'VX) (X'X) has  been  termed  the  misspecification  effect  (Scott  and  Holt, 
1982).  If  X'X  or  D  were  diagonal, 

k  ^  ^ 

k“^tr(PV)  =  k“^tr(D)  =  k”^  I  [var(6  )/var(B,|p  =  0)) 

1  j  J 

would  represent  the  average  inflation  (due  to  nonzero  P)  in  variance  for  the  OLS 
estimators.  More  generally  k”^tr(PV)  often  captures  the  main  components  of  the 
variance  inflation  and  may  be  termed  the  'approximate  misspecification  effect' . 

The  term  k“^tr(PV)  suggests  a  simple  adjustment  to  F,  whose  properties  are 
discussed  in  the  next  section. 

3.  A  Modified  F  Statistic 

The  foregoing  discussion  suggests  the  following  simple  modification  to  the  F 


statistic 


II  Xg  -  x3ll  /tr(PV) 

^  2 

lly  -  XBH  /[n  -  tr(PV)] 


kLn  -  tr(PV)] 
tr(PV) (n  -  k) 


(3.1) 


We  call  (2.4)  or  (2.5)  with  F  replaced  by  F*  a  modified  F  procedure.  Here  we 
assume  P  is  known.  Unknown  P  will  be  considered  in  Section  5.  It  will  be  shown 
later  that  the  modified  F  procedure  (3.1)  is  almost  exact  in  most  situations  in  the 
simulation  study. 

For  testing  a  subhypothesis  or  setting  confidence  regions  for  some  linear 

T  T  T 

combinations  of  the  6^'s,  let  the  parcimeters  of  interest  be  a6  =  (a^g , • • • ,a^g )  , 
where  A  is  a  q  k  matrix  of  rank  q  <  k.  The  ordinary  F  procedure  (Draper  and 
Smith,  1981)  for  Ag  is  based  on  the  F^  statistic  and  the  critical  value 
F^(q,n  -  k) , 


(Ag  -  Ag)'^(x3x  )“\Ag  -  Ag)/q 
A  A 


lly  -  X6ll  /{n  -  k) 


X^  =  X(x'^X)"’a'^ 


(3.2) 


Note  that  the  denominator  of  is  the  same  as  that  of  F.  By  approximating  the 

2 

numerator  of  F.  by  a  constant  multiple  of  X  with  the  same  first  moment,  a  simple 
A  q 

adjustment  to  F^  is  given  by 

(Ag  -  Ag)'^(x'fx^  )"  (Ag  -  Ag)/tr(P^V) 

A  A  A 

- - - - 

lly  -  XgH  /[n  -  tr(PV)] 


t[n  -  tr(PV)] 


A  tr(P  V)(n  -  k)  ' 
A 


(3.3) 


I-  '1 


where  P.  =  X-(XbX, )~^X^  is  the  projection  matrix  onto  the  column  space  of  X^, 

which  is  of  dimension  q.  The  simulation  study  in  Section  5  shows  that  the 

modified  F  procedure  (3.3)  is  almost  exact  in  most  situations  considered  there. 

2 

Since  for  large  n  the  denominator  of  F^  is  nearly  unbiased  for  p  ,  the 
difference  between  F^  and  F^  is  primarily  due  to  the  difference  between 
tr(P,V)  and  q.  One  may  use  tr(P^V)/q  as  a  measure  of  the  effect  of  intracluster 


-6- 


correlation  on  the  procedure.  A  value  larger  than  F^^(q,n  -  k)/T^^(q,n  -  k) , 

“2  >  “v  indicates  that  the  ordinary  F^^  test  with  nominal  level  has  actual 

level  at  least 

The  special  case  q  =  1  deserves  further  attention.  By  writing  A  as  a 

T  T  2 

1  k  vector  a  and  as  a  n  1  vector  v,  =  w  /Ivll  and 


T  T  T  -1  T  T  -1 

^  W  _  a  (X  X)  X  VX(X  X)  a 
crii-^vj  T  T  -1 

llvll  a  (X^X)  a 


(3.4) 


is  equal  to 


var ( a^0 )/var(a'^6 j P  =  0)  . 


Scott  and  Holt  (1982)  call  this  the  misspecif ication  effect  (meff)  for  estimating 

T  2 

a  6.  Since  q  =  1,  the  numerator  of  F^  has  a  distribution.  This  explains 

our  later  empirical  finding  that  the  modified  F  procedure  (3.3)  works  extremely 

well  in  the  one-parameter  case. 

The  applicability  of  the  proposed  modified  F  procedures  depends  very  much  on 
the  accessibility  of  the  values  tr(PV)  and  tr(P^V).  Let  us  consider  a  special 
situation  where  the  k  column  vectors  of  X,  denoted  by  x^,...,Xj^,  are  orthogonal 
to  each  other.  Then  the  projection  matrix 


P  =  y  x.x’’/llx.ll^ 
^13  3 


tr(PV)  =  1  x'^Vx./xTx.  =  I  var(6  )/var(0  |p  =  0) 


is  the  sum  of  misspecif ication  effects  for  estimating  the  k  orthogonal  parameters 
Similarly  for  testing  or  estimating  q  parameters  out  of 
q  <  k,  tr(P,V)  is  the  sum  of  the  corresponding  q  meffs.  The  proposed  procedures 


require  only  k  meffs,  which  may  be  provided  by  the  sampler,  no  matter  how  large 
n  or  c  is.  Note  that  the  orthogonality  condition  on  X  is  satisfied  by  balanced 
ANOVA  and  simple  linear  regression.  The  latter  will  be  studied  in  more  detail  in 


Section  4.1.  A  simple  model  not  satisfying  the  orthogonality  condition  will  be 
considered  in  Section  4.2. 


4.  Examples 


Here  we  allow  the  intracluster  correlation  coefficients  in  cluster  i  to 

be  possibly  unequal. 


4.1.  Simple  linear  regression 


The  i'"^  observation  in  the  cluster  can  be  expressed  as 


^ei  =  ^  ^^^i  -  ^  %i  ' 


(4.1) 


wher.-  X  ^  is  the  mean  of  Xj^^  in  the  sample  and  ^ satisfies  the  conditions 
(2.1)-(2.2).  The  correction  factor  tr(PV)/2  in  F’ ,  (3.1),  can  be  computed  as 

indicated  above  since  the  vector  1,  =  (1,...,1)  and  x  =  (x£^  “  ^ 

orthogonal.  That  is. 


^  ^  I'^V!  x'^Vx 

2“"’V)  -2 

11  XX 


=  2  ^ 


(4.2) 


is  an  average  of  the  overall  meffs  D  and  D,,  for  estimating  a  and  ci .  It  can 
’  a  h  ^ 

be  shown  that 


=  Z  —  (1  +  (m^  -  1)P^) 


is  a  weighted  average  of  the  meff 
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r 


'^a.je  '*  "^  ^®£  ~  ''^P£ 


for  estimating  a  in  cluster  *.,  and  by  writing  T  ,,  =  )  (x,.  -  x  )^  and 

i.1  •• 


£i 


^x  =  ^  r 

l  x,i 


Do  = 


1  ^  2  ■“  “2  ^J2, 

^  I  -  ’‘..J  -  pf.’  ^ 


T  ^  fi.'  i*  ’  '*  ^C'  ^  ^'^9^ 

X  ^i=l  ^  i=1 


5^..)'> 


c  T 


T  ‘x,ii  r,  f  ,1, 

I  11  +  Pjmf  - = - IJl 

i=1  X  ^  ^  \,i 


C  T 


X, 


Z  (1  -  -  l)Vi,x> 

I  X 


(4.4) 


is  a  weighted  average  of  the  meff 


°e,£  =  1  +  (-i  -  1)'^£p£,x 


for  estimating  6  in  cluster  i,  where 


i,x  mj^  -  1 


.  m . (x „  -  X  ) 

1  ,  V  I'  ••’ 

(nin 


1) 


x,^ 


can  be  regarded  as  a  sample  analog  to  the  intracluster  correlation  of  x  in  cluster 
i.  The  derivations  of  (4.3)-(4.4)  become  straightforward  once  the  component 
of  V  is  rewritten  as  (1  -  where  is  the  identity  matrix  and 


Jj  the  matrix  of  one's,  both  of  order  m» .  In  the  special  case  m-  =  m  and 


P 0  “  P/  D„  =  1  +  (m  -  1 ) P  and  Do  =  1  +  (m  -  1)PP  ,  where 

X.  a  p  X 


1  ‘"^£<’‘11.  - 
p__  =  - - -  (m  - - - 


X  m  -  1 


-  1) 


can  be  regarded  as  a  sample  analog  to  the  overall  intracluster  correlation  of  x. 


Since  this  is  a  relatively  easy  problem,  we  are  able  to  do  a  more  refinea 


analysis.  As  shown  in  Section  2,  the  numerator  of  F,  apart  from  a  ,  is 


2  2 

distributed  as  +  ^2^1'  where  and  are  the  eigenvalues  of  PV  and  can 


m 

I-V.vj 


:<A 


• A 

'  .  “ .  ^ 


t.: 


4 


»> 


•  '•  ■  .-i 


mi 


■'A 
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be  determined  by 


+  X2  =  tr(PV)  =  +  Dg 

22  222  T2  (4.5) 

X,  +  X  =  tr(PV)  =  +  D„  +  2(1  Vx)  /(nT  )  . 

1  2  a  p  ^  X 

T 

For  any  fixed  X^  +  which  is  independent  of  J.  Vx,  it  follows  from  (4.5)  that 

T 

x^  and  x^  would  be  wider  apart  if  1,  Vx  were  not  zero.  One  implication  is  that 

2  2  1  2 

the  approximation  to  X^x^  +  X^x^  by  —  (X^  +  X^)X2  is  less  accurate  for  nonzero 
T 

J  Vx  and  fixed  X^  +  X^,  and  therefore  the  effect  of  intracluster  correlation  on 
F  is  more  pronounced.  Note  that 

1_'^Vx  =  5^mjj^(1  +  -  ■>)P|^)  (Xjj^,  -  X,.)  =0 

if  -  1)P£  is  constant.  We  conjecture  that,  for  fixed  +  Dg  the  meff  on 

F  is  smaller  if  m^  and  P ^  are  both  constant. 

Since  _1  and  x  are  orthogonal,  it  is  easy  to  see  that  the  correction  factor 

tr(Pj^V)  in  F^  for  testing  a  and  for  testing  3  is  respectively  and  Dg . 

Consider  now  the  problem  of  testing  a  more  general  parameter  c^ci  +  ,  which  can 

be  handled  by  the  formula  (3.4).  The  vector  v  in  (3.4)  is  n  ^c.1  +  t”^c^  and 

1~  X  2*" 


tr(PV)  =  (- 


nT 


2  2 
™  c .  C- 

X 


T 

which  is  not  a  weighted  average  of  and  Dg  (unless  ^2  ~ 

one  might  expect. 


4.2.  Regression  with  two  covariate  variables 

This  is  the  simplest  example  in  which  the  column  vectors  of  X  are  not 
orthogonal.  The  response  value  y^^  is  related  to  the  two  covariates  x^^^  and 

where  e.,  satisfies  (2.1)-(2.2).  It  is  proved  in  Appendix  A  that 


1 


tr{PV)  *  D(j 


1  -  r 


S' 


2r 


2 


D 


(4.7) 


where  and  Dg  are  the  meffs  for  estimating  a  and  6,  given  by  (4.3)  and 

(4.4) ,  is  the  meff  for  estimating  Y  and  is  analogous  to  D.  with  in 

T  P  *1 

(4.4)  replaced  by 

T 

r  »  X  z/UxB  Bzii  »  ccrr(x,z) 

is  the  correlation  coefficient  between  x  and  z  =  (z.^  -  z  ),  ./  and 

~  ~  xi  ‘‘X,! 


C 

i 

Ji=i 


xz 


(4.8) 


is  a  weighted  average  of 


=  1  +  (m„  -  1)| 


T 

xz 


X 


I  (x^^ 
1=1 


.)(z 


Xi 


and 


5  0  =  - 7  (“>0 

xz,X  mjj  -  1  X 


mx(Xx.  -  x..)(z^,  -  z..) 
T 

xz,X 


-  1)  . 


Unlike  Dg  and  D^,  the  weight  in  (4.8)  may  be  negative.  For  m^^  =  m,  =  P, 

D„  =  1  +  (m  -  DPP  ,  where 
D  ,  Y  xz 


XZ  m  -  1 


(m  - = - 1) 


XZ 


can  be  regarded  as  a  sample  analog  to  the  intracluster  correlation  between  the 
and  the  Zj^^'s  in  the  population. 

When  r  is  small  the  third  term  of  (4.7)  is  relatively  small  and  tr(PV)  is 

approximately  the  sum  of  the  three  meffs  as  in  the  orthogonal  case  already  discussed. 

2  2 

For  example,  when  r  =  0.2,  1  -  r  =  0.96  and  2r  =  0.08  so  that  the  contribution 


of  the  third  term  is  negligible  since  Dg  ^  most  of  the  same  order  as  Dg 

and  Dy . 

It  should  be  pointed  out  that,  although  and  Dg  in  the  two-variable 

regression  model  are  formally  the  same  as  in  the  simple  linear  regression  model,  they 
are  actually  different  since  the  intracluster  correlation  coefficient  varies 

with  model.  Typically,  if  the  additional  covariate  variable  is  effective  in  explain¬ 
ing  the  variation  due  to  the  clustering  variable,  the  in  the  augmented  model  is 

smaller . 


For  testing  a,  the  correction  factor  tr(Pj^V)  is  and  for  testing  6 


and  y,  the  correction  factor  tr(P.V)  is 


—4  (Dj  .  D  -  2r\  )  . 

1  -  r 


(4.9) 


the  second  and  third  terms  of  (4.7).  This  is  easy  to  see  because  the  projection 
matrix  P^  for  the  two  problems  are  respectively  the  first  term  and  the  sum  of  the 
second  and  third  terms  of  P  in  (A.1).  For  testing  y  only,  the  correction 
factor  tr(P^V)  is  computed  as  follows.  To  use  the  general  formula  (3.4),  the 
vector  V  =  X(x'^X)“ ^ ( 0 , 0, 1  )^  in  (3.4)  turns  out  to  be  proportional  to 


T 

X  z 


which  is  denoted  by  jg  in  (A.1).  Then 

T 

w  VW 

tr(P.V)  =  - 7  , 

llwll 


which  is  the  third  term  of  (A. 2)  and  from  formulae  (A.3)-(A.4), 

tr(P^V) - ^  [D^  -  2r^Dg^^  +  r^Dg]  .  (4.10) 


When  r  is  close  to  zero,  tr(PV)  is  approximately  equal  to  Dy  but  is  in  general 


not  equal  to  Dy . 
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5.  Bnpirical  Investigations 
5.1.  Introduction 

Numerical  results  are  presented  for  the  case  of  two  independent  variables 
E(y)  =  a  +  6x  +  Yz»  the  simplest  model  which  allows  us  to  explore  the  impact  of 
various  factors  on  the  F  statistic.  Values  of  and  for  the  i^^  unit 

in  the  cluster  have  been  generated  from  the  bivariate  normal  distribution  with 

additional  random  effects  components  to  allow  for  intracluster  correlation  on  both  x 
and  z , 


Zfl.  =  U 

ix  z  zi  zii 


2  2  2  2  2  2 

P  =0/(0  +  o  )  and  P  =  o  +  o.  ) ♦  The  two  cluster  effects  a  and 

X  ux  ux  ex  z  oz  az  ez  x 


a  are  correlated  with  coefficient  P  for  the  same  cluster  and  similarly  the  two 
z  xz 


individual  terms  e  and  e  for  the  S2une  \init  are  correlated.  Otherwise  random 


X  z 

terms  are  uncorrelated.  For  given  values  of  the  various  parameters,  values  for  x 
and  z  were  generated  for  c  =  10  clusters  and  m  =  10  observations  per  cluster. 
In  the  linear  model  framework  inference  is  conditional  on  the  values  of  x  and  z 
and  data  sets  were  retained  for  use  in  the  numerical  investigations  only  if  the 
specific  data  set  generated  exhibited  the  required  structure  (i.e.,  achieved 
estimates  for  P^,  P^,  etc.  were  close  to  the  desired  values). 

For  the  initial  results  describing  the  actual  significance  levels  of  the  F 
test  when  P  is  known  (Table  1),  the  results  were  obtained  for  given  values  of  x 
and  z  without  simulation  by  using  the  approximation  described  in  Appendix  B. 
Subsequent  results  on  the  performance  of  F'  when  P  is  unknown,  on  a  further 
modification  to  F  and  on  the  GLS  procedure  for  comparative  purposes  were  obtained 
by  computer  simulation.  Conditional  on  the  values  of  x  and  z,  values  of  y^^^ 
were  generated  with  the  required  intracluster  correlation  structure  using  random 
effects  terms 
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and 


2  2 
where  var(a^)  =  ct^,  var{e^^)  = 


For  each  simulated  set  of  y  values,  each  of  the  test  procedures  was  carried 
out.  For  each  set  of  x  values  the  simulation  was  repeated  10,000  times  to  obtain 
estimates  of  the  actual  significance  level  of  each  test  procedure.  Thus,  actual 
significance  levels  presented  for  each  procedure  are  accurate  to  ±0.5%. 

5.2.  Known  P 

Table  1  contains  the  actual  significance  level  of  the  nominal  5%  F  test  for 

various  values  of  p,  p  ,  p  and  corr(x,z).  In  this  table  P  =0.  The  main 

X  z  ~  ~  xz 

points  to  note  are  as  follows. 

1.  tr(P^V)/q  is  a  good  indicator  of  the  level  of  distortion  of  the  F  test  by 
intracluster  correlation, 

2.  When  P^,  P^  =  0,  the  F  test  is  unaffected  as  we  would  expect. 

3.  Similarly  when  p  =  0,  there  is  no  effect. 

4.  For  testing  7=0  the  strongest  effect  comes,  as  we  would  expect,  from 

P  ^0  with  P  0. 
z 

5.  Even  when  P^  =  0,  there  is  an  effect  from  ^  P  ^  0  when  corr(x,z) 
is  large.  The  correlation  between  x  and  z  allows  the  intracluster  correlation 
for  the  X  variable  to  have  an  impact  on  the  test  for  Y  =  0  (the  coefficient  of 
the  z  variable).  This  effect  is  not  as  large  as  the  direct  effect  of  P^  f‘  0. 

This  point  can  be  explained  by  formula  (4.10). 

6.  The  effects  for  testing  6  =  Y  =  0  tend  to  be  larger  than  on  the  test  for 
Y  =  0.  This  is  easily  justified  by  comparing  (4.9)  and  (4.10). 


7.  Although  not  shown  in  the  table,  the  actual  significance  level  for  F*  is 

always  5%  when  testing  Y  =  0  since  the  numerator  of  the  F  statistic  is  a  simple 

2 

multiple  of  only  one  and  the  modification  is  exact,  restoring  the  properties  of 

F  entirely.  For  testing  S  =  Y  =*  0  the  approximation  used  in  F'  is  not  perfect 
but  the  actual  significance  level  achieved  was  in  the  range  5%  to  7%  in  all  cases. 


TABLE  1:  Actual  significance  level  of  the  nominal  5%  F  test  and  the  value  of 

tr{Pj^V)/q  (in  brackets)  for  c  =  10  clusters  and  m  *  10  observations 
per  cluster  when  =  0. 


Testing  Y  =  0 


corr(x,z) 


5 

(1.0) 

5 

(  .99) 

5 

(  .98) 

5 

(  .96) 

5 

(  .92) 

5 

(1.0) 

5 

(  .99) 

5 

(  .97) 

5 

(  .94) 

4 

(  .88) 

5 

(1.0) 

5 

(1.01) 

5 

(1.03) 

6 

(1.06) 

7 

(1.11) 

5  (1.0) 
5  (1.0) 
5  (1.0) 


7  (1.21) 

8  (1.24) 

9  (1.32) 


10  (1.41) 

11  (1.47) 
13  (1.64) 


15  (1.82) 

16  (1.94) 
20  (2.28) 


24  (2.65) 
26  (2.88) 
31  (3.56) 


5 

(  1.0) 

5 

(  .97) 

5 

(  .95) 

4 

(  .89) 

3 

(  .79) 

1  5 

(1.0) 

5 

(1.01) 

5 

(1.02) 

6 

(1.04) 

7 

(1.08) 

5 

(  1.0) 

7 

(1.13) 

8 

(1.25) 

11 

( 1.50) 

18 

(2.00) 

V 

“■  .yV 

.  ’  O. 

corr(x,z) 


5  (1.0) 
5  (1.0) 
5  (1.0) 


5  (  .99) 
5  (  .99) 
5  (1.01) 


5  (  .99) 
5  (  .98) 
5  (  1.01) 


5  (  .97) 

5  (  .97) 

6  (1.03) 


5  (  .95) 
4  (  .93) 

6  (1.05) 


5 

(1.0) 

6 

(1.07) 

8 

(1.15) 

11 

(1.30) 

17 

(1.59) 

5 

(1.0) 

7 

(1.09) 

8 

(1.18) 

12 

(1.37) 

19 

(1.74) 

5 

(1.0) 

8 

(1.15) 

10 

(  1.30) 

16 

(1.60) 

26 

(2.19) 

5 

(1.0) 

7 

(1.09) 

8 

(1.19) 

12 

(1.37) 

20 

(1.74) 

5 

(1.0) 

7 

(1.01) 

9 

(1.22) 

13 

(1.43) 

22 

(1.86) 

5 

(1.0) 

8 

(1.15) 

10 

(1.30) 

16 

(1.60) 

27 

(2.20) 

In  practice  it  is  unli)cely  that  =  0  since  for  many  situations  one  would 

expect  that  the  cluster  effects  for  x  and  z  are  derived  from  a  common  source  or 
influence.  In  this  case  the  cluster  effets  for  x  and  z  would  be  positively 
correlated.  Figure  1  presents  graphically  an  extension  of  the  pattern  of  results  in 
Table  1  for  testing  Y  =  0  to  include  P  ^0. 

We  note  the  following  p»ints. 

1.  When  corr(ji,j)  =  0,  the  value  of  P^^  makes  no  difference.  This  is 
because  the  only  term  involving  P^^  in  (4.10)  is  zero  if  corr(;iS,g)  =  0. 

2.  In  general,  the  effect  on  the  F  test  is  accentuated  if  there  is  a  strong 
difference  between  P^^  and  corr(«,a).  High  values  of  |corr(*,g)|  show 
particularly  strong  effects  when  associated  with  P^^  =  0.  This  is  because  the 
second  term  of  (4.10)  is  negative  for  P  >0  and  will  have  a  larger  effect  in 


reducing  tr(Pj^V)  when  |corr(}j,^)l  is  large.  When  =  0,  the  second  term  in 

(4.10)  is  zero  and  there  is  no  reduction  to  tr(Pj^V). 

3.  The  affects  described  here,  of  the  differential  effect  of  P  have  a 

xz 

smaller  impact  than  the  direct  effects  of  P  0  and  P  0.  This  is  obvious  from 

z 

7  2 

(4.10)  since  Dg  which  involves  coefficient  r  =  corr  (x,z)  (<  1) 

and  D^,  which  involves  P^  and  P,  has  coefficient  1. 

5. 3.  Unlcnown  P 

In  general  P  is  un)tnown  euid  must  be  estimated  in  order  to  modify  the  F 
statistic  or  alternatively  to  use  GLS.  As  a  first  approximation,  which  requires  no 
iteration,  we  may  use  OLS  to  estimate  the  regression  coefficients  and  then  obtain  an 
estimate  of  P  from  the  residuals 

(n  -  )c)a^  =  y*  (I  -  P)y  . 

c 

Let  Pgj^  =  11' /m,  where  1'  =  (1,...,1)  of  length  m,  Pg  =  ®  P^j^  eund 

i.”  1 

Pjj  =  1  -  Pg.  Here  Pg  and  P^  are  symmetric,  idempotent  projection  matrices  and 
are  orthogonal  to  each  other.  Now 

(n  -  )c  -  c  +  1)5^  =  y' (1  -  P)Pyjy 

and  we  may  use  p  =  1  -  •  Typical  simulation  results  for  testing  Y  =  0  are 

presented  in  Table  2.  The  adjustment  to  F  using  P  is  better  than  using  the 
xanadjusted  F  statistic  but  not  as  good  as  when  P  is  Itnown. 

Part  of  the  problem  with  using  P  is  that  it  is  biased  not  simply  because  it  is 

/V  2  2 

a  ratio  estimator  but  also  because  and  8  are  biased. 

2 

Holt  and  Scott  (1981)  show  that  8  is  biased  since 
(n  -  y.)E{8^)/o^  =  tr[(I  -  P)V] 

=  n  -  It  -  p{m  tr(PP„)  -  )t}  (5.1) 

n  -  )t  . 

In  practice  the  extra  term  is  small.  For  example,  with  m  =  c  =  10,  n  =  100,  )t  =  3, 

/^2  2 

P  =  0.  1,  the  downward  bias  in  P  is  at>out  2%  of  o  .  Similarly, 


the  bias  of  P«  Also,  if  9  and  were  unbiased,  then  even  small  coefficients 

of  variation  in  the  estimators  would  result  in  large  fluctuations  in  P. 

Equations  (5.1)  and  (5.2)  suggest  a  further  adjustment  to  the  F  statistic  by 

/v2  ^2  2 

adjusting  laoth  and  to  yield  approximately  unbiased  estimates  of  o  and 

2 

in  a  two-stage  process. 

1.  Use  the  OLS  residuals  to  obtain  a  first  estimate  P  as  described  above. 

2.  Replace  P  in  (5.1)  and  (5.2)  by  P  to  obtain  approximately  unbiased 

2  2 

estimates  of  o  and  and  hence  a  new  estimate  P  of  P. 

3.  Use  this  estimate,  P,  in  adjusting  the  F  statistic  as  described  for  F' . 

We  call  this  statistic  F” . 

This  procedure  is  a  two-stage  process  (although  it  could  be  iterated)  and  is 
approaching  the  complexity  of  the  iterative  GLS  procedure.  Computer  simulations  were 
carried  out  to  estimate  the  actual  significance  level  of  this  procedure  under  various 
conditions  and  also  to  compare  with  the  iterative  GLS  procedure.  Figure  2  shows  the 
actual  significance  levels  for  F"  and  GLS  for  various  situations.  When  the  true 
value  of  P  is  very  small  (0.01),  the  GLS  procedure  has  convergence  problems  and  the 
significance  levels  reported  are  a  slight  underestimate  since  they  are  based  only  on 
those  cases  where  convergence  was  achieved. 

The  main  points  to  note  are  as  follows: 

1.  Both  the  F"  procedure  and  GLS  remove  the  substantial  impact  on  the  ordi¬ 
nary  F  statistic  when  P  is  large,  although  at  the  cost  of  numerical  complexity 
and  a  slight  increase  in  the  significance  level  when  P  is  very  small  (0.01). 

2.  The  F"  procedure  is  superior  to  the  GLS  procedure  for  small  (0.01)  and 
moderate  (0.1)  values  of  P. 

3.  The  achieved  significance  levels  are  above  5%  but  the  remaining  distortion 
is  small.  The  worst  situations  are  when  corr(;$,;g)  is  strong. 

4.  Only  one  choice  of  values  of  P^  and  P^  is  reported  in  Figure  2  but  other 
values  confirm  the  same  pattern. 
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5.4.  Unequal  intracluster  correlations  in  different  clusters 


The  theory  in  Section  4  allows  for  the  possibility  that  P  may  vary  across 
clusters  and  we  conjectured  that  variation  in  P  would  result  in  an  increased 
distortion  to  the  F  statistic.  To  explore  this  question  we  have  carried  out 
further  simulations  which  are  from  a  model  even  more  qeneral  than  that  used  in 
Section  4.  The  previous  simulation  procedure  was  modified  so  that  the  random  effect 
generated  for  clusters  was  multiplied  by  a  constant  w  for  one-half  of  the  clusters 
and  left  as  before  for  the  other  half.  Thus  for  half  of  the  clusters 
var(y|x)  =  +  oj,  =  0^/(0^  +  o^)  =  say  ; 

for  the  other  half  of  the  clusters 

var(y|x)  =  +  o^)  =  p^,  say  . 

Thus  the  simulations  allow  for  unequal  Pj^  and  also  unequal  variances  of  the  error 
terms  from  the  model. 

Table  3  contains  the  actual  significance  levels  of  the  various  test  procedures 
for  the  case  when  P^^  “  .48,  P^  =  .49  and  =  .42,  and  for  various  values  of 

P^  and  ^2’  table  also  contains  corresponding  results  for  the  case  of  a  common 

value  P  for  all  clusters  which  was  chosen  to  yield  the  same  overall  intracluster 
correlation . 

The  main  points  to  note  are  as  follows; 

1.  The  F  test  is  slightly  more  distorted  for  unequal  P^,  although  the  main 
effect  comes  from  P  ^  0  rather  than  the  variation  in  Pj^. 

2.  Both  the  modified  procedure  F"  and  the  GLS  procedure  continue  to  perform 
reasonably  well,  although  there  is  slightly  more  distortion  than  with  constant  p. 
The  GLS  procedure  seems  slightly  less  affected  than  F". 
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TABLE  3:  Comparison  of  actual  significance  levels  for  various  tests,  nominal  5% 
level,  for  equal  and  unequal  p  =  .48,  P  =  .49,  p  =  .42,  and 

^  X  Z  X  z 

m  =  c  =  10;  testing  7=0. 


w  p^ 

^2 

P 

F 

Test 

F" 

GLS 

Unequal  P£ 

2  .3 

.63 

33.6 

8.  1 

6.0 

Equal  Pj^ 

.52 

31.6 

7.4 

6.0 

Unequal 

4  .3 

.87 

45.4 

8.9 

5.7 

Equa  1  P 

.78 

41.7 

7.7 

5.7 

Unequal  Pj^ 

2  . 1 

.31 

17.9 

7.7 

7.5 

Equal  P£ 

.22 

17.2 

6.9 

7. 1 

Unequal  P^ 

4  .  1 

.64 

33.2 

8.5 

6. 1 

Equal  P£ 

.49 

30.3 

7.4 

6.  1 

Unequal  P£ 

2  .05 

.17 

12.  1 

7.0 

8.4 

Equal  P£ 

.12 

11.7 

6.8 

8.2 

Unequal  P^ 

4  .05 

.46 

24.0 

8.2 

7.0 

Equal  p£ 

.31 

22.0 

7.  1 

6.8 

Conclusion 

It  is  clear  that 

the  ordinary 

F  test 

may  be 

seriously 

distorted 

in  two- 

sampling  when  there  is  positive  intracluster  correlation.  Our  theoretical  results 
show  the  importance  of  tr(Pj^V)  as  a  diagnostic  and  basis  for  correction  to  the  F 
statistic.  For  cases  where  there  is  low  correlation  between  the  regressor  variables, 
tr(P^V)  is  seen  to  be  approximated  by  the  sum  of  the  meffs  for  the  variables  involved 
in  any  subset  of  regressors.  A  general  ANOVA-type  decomposition  for  tr(P^V)  is 
given  in  terms  of  the  contributions  of  the  individual  regressors  and  their  cross 
products.  The  numerical  results  given  show  the  possible  levels  of  distortion  to  the 


significance  level  of  the  nominal  5%  test  for  the  case  when  there  are  10  observations 
per  cluster.  Larger  cluster  sample  sizes  will  lead  to  greater  effects  and  vice 


^^hen  P  is  known  (or  when  the  sample  size  and  number  of  clusters  is  so  large 
that  P  is  very  accurately  estimated)  a  simple  modification  to  the  F  statistic 
seems  to  work  well  and  to  provide  a  test  procedure  of  approximately  the  correct 
significance  level. 

A 

When  P  is  unknown  and  must  be  estimated,  we  note  the  large  effect  on  P  which 

2  2 

comes  from  relatively  small  variations  in  and  o  .  This,  perhaps,  explains  the 

common  practice  in  the  sample  survey  literature  of  pooling  estimates  of  deff  (and  in 
a  similar  way  the  correlated  components  of  response  variance)  to  achieve  some 
stability.  In  this  case  the  usual  alternative  is  to  use  GLS  but  for  small  values 
of  P  (less  than  0.1)  we  have  suggested  an  alternative  which  is  numerically  simpler 
and  seems  to  work  better  in  practice.  In  the  survey  context,  experience  suggests 
that  0  <  P  <  0. 1  is  a  likely  range  of  possible  values  and  so  the  F"  procedure 
suggested  here  is  a  realistic  alternative  to  GLS.  In  our  view,  larger  values  of  P 
often  suggest  an  inadequately  specified  model  so  that  the  first  step  should  be  to 
introduce  additional  explanatory  variables  which  account  for  some  of  the  between- 
cluster  variation  rather  than  simply  accepting  such  a  high  value  of  P  and  modifying 
the  F  statistic. 

The  final  set  of  numerical  results  allow  for  a  situation  with  unequal  within  - 
cluster  variances  and  correlations,  which  is  more  general  than  the  population  model 
used  in  the  theory.  The  limited  numerical  results  presented  suggest  that  these 
sources  of  variation  between  clusters,  increase  the  distortion  of  the  usual  F 
procedure.  Both  the  GLS  and  the  alternative  modification  to  F  continue  to  work 
reasonably  well. 
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Decompose  g  into  two  orthogonal  components 
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and  denote  its  second  component  by  jj.  Then  the  projection  matrix  P  can  be 


expressed  simply  as 
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since  J,  X  and  jg  are  mutually  orthogonal.  From  (A.1), 
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whose  first  and  second  terms  are  and  Dg  respectively.  To  compute  the  third 

term,  note  that 

T  2  T 

m  ip  (S  m  S  Z  m 

w  W  =  z  Vz  +  - —  X  Vx  -  2  - -  X  Vz 

“z“  "z" 

,  T  2  ,  T  2 

2  z)  (z  z) 

=  “z"  Dv  +  - Dft  ■  2  - T'  V 

^  llx«^  ®  HxH^ 


llw«  =  llzll  - 


T  2 

(z  z) 
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Formula  (4.7)  follows  from  (A.2)-(A.4) 


Appendix  B.  Approximation  to  the  true  significance  level  of  F. 

From  (2.6),  the  true  significance  level  of  F 

Prob{F  >  F^{k,n-k)}  =  Prob{6’’v’'^^  iP-d(I-P)  ]  >  0}  ,  (A.  5) 

where  <5  ~  N(0,I)  and  d  =  k(n  -  k)  V^(k,n  -  k).  Let 

X>***>X  >o=X=«**=X  >X>«**>X  be  the  n  eigenvalues  of 

1  r  r+1  s  s+1  n  ’ 

[P  -  d(I  -  P)]V.  Then  (A. 5)  equals 


r  n 

ProbCl  X  C./  I  jX  |C  >  1}  ,  (A. 6) 
1  ^  s+1 


where  the  are  independent  random  variables. 


Use  the  approximation 


I  x.c. 
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ax 


1.  I  -  bxj  (A.7) 
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where  a  =  ^  X. /I  X.  , 
1  1  1  1 


=  (I  \)^/l 

1  1 


n  „  n  - 

(  I  X  )V  I  X^ 

s+1  s+1 


are  obtained  by  matching  the  first  two  moments.  The  approximation  in  (A.7)  is  known 
to  be  very  accurate.  Now  (A. 6)  can  be  approximated  by 

Prob{aXy/bXy  >  1}  =  Prob{F(ii,v)  >  , 

which  can  be  evaluated  from  the  F  distribution.  The  problem  of  non-integral 


degrees  of  freedom  u  and  v  is  handled  by  interpolation. 
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